Synthetic genes and bacterial plasmids devoid of CpG

ABSTRACT

The invention relates to a new series of bacterial plasmid vectors which are fully devoid of CpG and which can express synthetic genes which do not contain CpG in the bacteria  Escherichia coli.

This application is the US national phase of international application PCT/FR02/00862 filed 11 Mar. 2002, which designated the US.

FIELD OF THE INVENTION

The present application relates to synthetic genes and to plasmids entirely devoid of CpG.

TECHNOLOGICAL BACKGROUND

Plasmids are genetic elements essentially found in bacteria, made up of a molecule of deoxyribonucleic acid, which is most commonly circular and the replication of which is autonomous and independent from that of genomic DNA. Natural plasmids isolated from a very broad variety of bacteria are capable of accomplishing several cellular functions. The first, which is vital for all plasmids, is that responsible for their replication, generally carried out in a manner which is synchronized with replication of the genomic DNA and cell division. Besides the region required for replication of the plasmid, all natural plasmids carry genes which encode proteins, the function of which most commonly remains unknown due to a lack of scientific investigation with regard to these genes. The number of genes present on a plasmid determines the size of this plasmid, the smallest natural plasmids containing only two to three genes. The properties of plasmids attracted research scientists to them very early on, to make them vehicles for transporting and expressing genes in prokaryotic cells, as well as in eukaryotic cells. The very rapid progress observed in the fields of the molecular biology of nucleic acids and proteins, over the past two decades, can in part be attributed to the exploitation of recombined plasmids constructed from fragments of natural DNA of plasmid origin or other cellular DNAs, and even chemically synthesized.

The four bases adenine (A), guanine (G), cytosine (C) and thymine (T) which constitute deoxyribonucleic acid (DNA) are distributed in 16 dinucleotide configurations, namely CG, GC, TA, AT, CC, GG, TT, AA, TG, CA, AG, CT, AC, GT, GA and TC. Analysis of the qualitative distribution of the dinucleotides of the DNA of thousands of plasmids for which sequences are known reveals that the 16 dinucleotides are always present in all natural plasmids or plasmids constructed in the laboratory. However, analysis of the quantitative distribution of the dinucleotides of plasmids shows great disparities which depend, only partly, on the percentage of each one of the four bases of the DNA. Specifically, comparison of the frequencies observed for each one of the dinucleotides with those of the frequencies calculated on the basis of a random association between two bases, for a given plasmid, can demonstrate major differences for several dinucelotides in terms of an over representation, or, on the contrary, an under representation (Campbell A., Mrazek J. and Karlin S. (1999) Proc Natl Acad Sci USA 96, 9184-9). The differences observed in the distribution of certain dinucleotides, not always the same, of natural plasmids isolated from bacteria of phylogenically distant species have been explained by differences in specificity in the mechanisms of repair, recombination and replication acting on the cellular DNAs.

Gene transfers in vitro into cells in culture and in vivo into various animals are common practices undergoing great development, on the one hand, for the purpose of gaining a better understanding of cell function and, on the other hand, in order to apply these techniques to cell and gene therapies. None of the viral vectors and plasmid vectors among the panoply of vectors available for gene transfer in animals has taken a decisive advantage over the others, since each one has advantages, but also disadvantages. There is, however, an application in which naked plasmid DNA or plasmid DNA complexed with various substances to facilitate DNA transport to the nucleus is the subject of intense research activity, namely that of immunizing DNA. The principle of immunizing DNA is based on the immune responses observed in laboratory animals treated, by intramuscular or intradermal injection or by inhalation, with plasmid DNA encoding an antigenic peptide. It is now well established that a first consequence of introducing a plasmid DNA derived from the bacteria E. coli into the body of an experimental animal intravenously and intramuscularly is the rapid production of various cytokines by the guard cells of the immune system (Krieg A. M. and Kline J. N. (2000) Immunopharmacology 48, 303-305). This response is extremely specific for bacterial DNA since DNA extracted from animal cells does not cause such an induction of cytokines under the same conditions. The cellular mechanisms involved in this immune response are far from being fully understood. However, it is known that the recognition which discriminates between bacterial DNA and DNA of animal origin takes place at the level of structural differences relating to the methylation of certain cytosines of the molecule. Specifically, mammalian DNA is naturally methylated at the cytosine of all CG dinucleotides (subsequently written CpG), with the exception of short regions of high CpG density, called CpG islands, present in functional regions in some promoters. DNA extracted from E. coli does not exhibit this type of methylation due to the absence of the enzyme activity capable of accomplishing this modification in this bacterium. It is, however, possible to methylate the CpGs of plasmid DNA extracted from E. coli in a test tube with an appropriate enzyme. Under these conditions, DNA methylated in vitro loses a great deal of its immunostimulant activity compared to control nonmethylated DNA. The E. coli strain K12, from which virtually all the mutant strains used to produce plasmid DNAs are derived, contains an enzyme activity (DNA methylase dcm (Palmer B. R. and Marinus M. G. (1994) Gene 143, 1-12)) which leads to methylation of cytosine occurring in the nucleic acid context CC(A/T)GG. All plasmid vectors for gene transfer contain this sequence in varying number and, as a result, their DNA molecule contains methylated cytosines which are not found in mammalian DNA. This form of methylation specific to E. coli thus introduces another difference into the cytosine methylations between bacterial DNA and that of mammals, which might contribute to the immunostimulant capacity of plasmid DNA.

The CpG frequency in primate and rodent DNAs is, overall, much lower than that expected on the basis of the frequency of cytosines and guanines. The CpG deficiency is dependent, for a given DNA fragment, on the biological role of this fragment, intergenic regions containing only a fifth of the expected frequency, while exons have a less marked deficiency and, at the other extreme, some promoters containing a large CpG island exhibit a CpG percentage close to that expected. Analysis of the data from sequencing human cDNAs and chromosomes reveals, however, broad heterogeneities in the CpG frequency for promoter regions and cDNAs. This observation is illustrated by the cDNA of the human gene encoding interleukin 2, which has just one CpG. Similarly, a portion of the promoter of this gene containing the TATA box does not contain any CpG, but, on the other hand, the upstream portion rich in transcription factor recognition sites contains CpGs. The regions positioned 3′ of the genes, formed by the 3′ UTRs (untranslated regions) and the polyadenylation and end of transcription sequences are rather poor in CpG. In humans, it is not unusual to find regions immediately downstream of the genes, which are devoid of CpG. However, the human sequencing data available at the end of 2000 have not made it possible to demonstrate a single transcriptional unit, made up of the transcription promoter regions, a gene with or without an intron and the polyadenylation region, which is completely devoid of CpG. The situation of the CpGs in E. coli is quite different from that of animal cells since the frequency of CpGs in the genomic DNA of this bacterium is slightly greater compared to the calculated frequency. The same is true for the CpGs of natural plasmids isolated from hospital strains of E. coli. The recombined plasmids resulting from genetic manipulations, used for gene transfer, exhibit variations in their CpG numbers which depend on the origin of the fragments inserted into the vector. Analysis of the sequences of several tens of recombined E. coli plasmids randomly taken from the GenBank databank shows that the plasmids most lacking in CpG have, at the very most, a 50% deficiency in the number of their CpGs.

As regards the present invention, it provides products and methods for synthesizing plasmid DNA in E. coli which is completely devoid of CpG and in which the cytosines placed in the context CC(A/T)GG are not methylated. To the applicant's knowledge, this is the first description of such products which exhibit such a structure while at the same time having conserved their function.

DESCRIPTION OF THE INVENTION

The present invention provides means for producing plasmids which are functional in a prokaryotic organism such as Escherichia coli, but which are nevertheless completely devoid of CpG. More particularly it provides means for producing plasmids which are completely devoid of CpG, and which are also free of cytosine methylation in the nucleic acid context CC(A/T)GG.

The present application thus relates to methods for producing such plasmids, and also to the elements constituting these plasmids, namely genes devoid of CpG which can be expressed in E. coli, promoters devoid of CpG which are suitable for the expression of said genes, and origins of replication devoid of CpG which are suitable for the bacterial transformation of said plasmids. The present application is also directed towards the biotechnological and medical applications of these products. Each one of these products has the particular characteristic of being completely devoid of CpG, while at the same time having conserved its functionality in a prokaryote such as E. coli. The present application also provides an E. coli strain specially suited to the production of the plasmids according to the invention, this strain having the particular characteristic of allowing stable replication of these plasmids and of the genetic material which they transport, without impairing their function, and without, however, inducing methylation at CC(A/T)GG sites (strain comprising an inactivated dcm gene).

One of the common concepts linking the various aspects of the invention is therefore to make it possible to produce plasmids which are completely devoid of CpG and which have, despite everything, conserved their functional properties in a prokaryote such as E. coli. To the applicant's knowledge, this is the first description of such means.

The present application is thus directed toward a method for producing a plasmid which is a vector of at least one gene, and which is completely devoid of CpG characterized in that a plasmid is constructed by assembling, by enzyme ligation, DNA fragments, all devoid of CpG, corresponding to an origin of replication for the plasmid and to elements constituting a transcriptional unit for said at least one gene, and in that this plasmid is transferred into an Escherichia coli strain expressing the pi protein for replication of the plasmid.

Plasmids isolated from wild-type bacterial strains generally accomplish three functions in relation to replication, namely initiation of DNA replication, control of replication and stable maintenance of the plasmid during successive divisions. Plasmids constructed in the laboratory do not always exhibit all of these functions. The number of copies of the plasmids is, for example, quite often increased compared to the parent plasmid, denoting that replication control elements have been modified. The plasmid R6K contains three origins of replication, alpha, gamma and beta, linked on the same DNA fragment (Filutowicz M. and Rakowski S. A. (1998) Gene 223, 195-204). Each one of the origins is activated by the R6K specific pi initiation protein encoded by the R6K pir gene. In order to be functional, the three origins need a 277 bp sequence, known by those skilled in the art as “core”, located at the center of the fragment carrying the three origins, and also an additional single fragment positioned in cis, i.e. present on the same DNA molecule. When the sequences of the alpha and beta origins are deleted, the remaining gamma origin allows autonomous duplication of the plasmid on the condition that the pir gene is present in cis on the plasmid or in trans on the chromosome of the bacterium. The inventors chose to focus more particularly on the smallest of the three origins, namely the gamma origin, which has the advantage of containing all the elements required for controlled replication of the plasmid, namely the core and an adjacent activating sequence. The core is made up of a pi protein-binding sequence repeated 7 times, and an AT-rich sequence. The activating region contains binding sites for several cellular proteins of the bacterium, required for stably maintaining the plasmid. The number of copies of the plasmids containing only the gamma origin depends on the pi protein; mutant forms of pi leading to a large increase in the number of copies of the plasmid have been isolated and characterized. As shown in greater detail in the examples below, the inventors have succeeded in constructing, from the gamma origin of replication of the plasmid R6K, origins of replication which no longer exhibit any CpG, while at the same time having conserved their functionality intact. It may be noted that the specific choice of R6K gamma as starting material, namely the choice of a small replicon which exhibits only a small number of CpG, is a particularly relevant choice insofar as, when other plasmids such as those of the pUC series are used as starting material, CpG-free plasmids which remain functional are not successfully obtained: all the attempts made by the inventors to chemically reconstitute the minimum pUC sequence by replacing the cytosines of CpGs with guanine or adenine resulted in DNA fragments which had lost all functional replication activity. As regards the plasmids which comprise a CpG-free origin of replication according to the invention, they have conserved their ability to replicate stably within a prokaryote cell such as E. coli, and E. coli K12 in particular, provided, of course, that they are provided with the pi protein required to activate the replication (wild-type pir or mutated pir such as pir 116 in cis or in trans). An origin of replication for plasmid according to the invention is characterized in that its sequence corresponds to that of the R6K gamma origin of replication in which each G of the CpGs of the repeat-region of the core has been replaced with an A, a C or a T, or each C of the CpGs has been replaced with a G, an A or a T. Various CpG-free origins of replication have thus been obtained, which, surprisingly, are still capable of performing the functions of origin of replication of plasmids in E. coli and, what is more, are capable of performing these functions for genes and transcriptional units which are themselves devoid of CpG. The examples below give some illustrations thereof (cf. origins R6K gamma M2A, R6K gamma M2C, R6K gamma M2T in examples 7-10). The present application is directed more particularly toward any origin of replication whose sequence comprises the sequence SEQ ID No. 12 or SEQ ID No. 13 (FIGS. 12 and 14). It has also been demonstrated that the pi protein-binding sequence can be not repeated 7 times, as is observed in the standard R6K gamma origin with CpG, but that the number thereof can be limited to 5 or 6, without however impairing the functions of the origin of replication. The present application is thus directed toward any origin of replication according to the invention as defined above, which would comprise only 5 or 6 repeats of the pi protein-binding sequence. By virtue of these CpG-free functional origins of replication, the inventors have been able to construct various plasmids which, notably, have conserved their transfection vector functions.

The creation of E. coli plasmids devoid of CpG necessarily requires having functional genes (which can be expressed in prokaryotes such as E. coli) which do not contain any CpG. Thus, selection of the bacteria transformed with a recombined plasmid DNA involves a gene whose protein confers a dominant advantage on the bacterium. Most commonly, the selective marker is introduced by a gene for resistance to an antibiotic which is active on the E. coli bacterium. Analysis of the wide variety of resistance genes used in E. coli shows that, without exception, they all contain CpGs, very often in very high numbers for resistance genes originating from Streptomyces which produce the antibiotic for selection. Similarly, it is necessary to have reporter genes which are CpG-free while at the same time remaining functional.

Analysis of several hundred chromosomal and plasmid genes of the E. coli bacterium, the well-characterized sequences of which are available in several databanks, reveals that all genes greater than 250 pb in size, without exception, consist of 16 dinucleotides.

The present invention demonstrates that it is, despite everything, possible to construct genes which are functional in E. coli and which are devoid of CpG. The inventors have in fact developed a method for obtaining genes devoid of CpG while at the same time being able to be expressed in E. coli. This method is based on the synthesis of a polynucleotide chain by following the amino acid chain of a protein which can be expressed in E. coli, assigning to each amino acid a nucleotide codon chosen from those which, according to the genetic code, and taking into account the degeneracy of this code, correspond to this amino acid, but eliminating from this choice:

-   i. all codons containing a CpG in their sequence: this concerns the     codons ACG (Thr), CCG (Pro), GCG (Ala), TCG (Ser), CGA (Arg), CGC     (Arg), CGG (Arg) and CGT (Arg), and -   ii. codons which finish with a C when the codon which follows it     directly begins with a G. Examples of a gene thus obtained comprise     the NeoΔCpG gene (SEQ ID No 316; cf. example 11).

According to one variant of implementation of the invention, the codons for which the frequencies are low in proteins of human origin will also be eliminated from said choice: this concerns the codons ATA (Ile), CTA (Leu), GTA (Val) and TTA (Leu). The set of possible codons therefore, according to this variant, corresponds to the following set:

-   GCA (Ala), GCC (Ala), GCT (Ala), AGA (Arg), AGG (Arg), AAC (Asn),     AAT (Asn), GAC (Asp), GAT (Asp), TGC (Cys), TGT (Cys), CAA (Gln),     CAG (Gln), GAA (Glu), GAG (Glu), GGA (Gly), GGC (Gly), GGG (Gly),     GGT (Gly), CAC (His), CAT (His), ATC (Ile), ATT (Ile), CTC (Leu),     CTG (Leu), CTT (Leu), TTG (Leu), AAA (Lys), AAG (Lys), TTC (Phe),     TTT (Phe), CCA (Pro), CCC (Pro), CCT (Pro), TCA (Ser), TCC (Ser),     TCT (Ser), AGC (Ser), AGT (Ser), ACA (Thr), ACC (Thr), ACT (Thr),     TAC (Tyr), TAT (Tyr), GTC (Val), GTG (Val), GTT (Val), -   to which rule ii. above should of course be applied. Examples of a     gene obtained in accordance with this variant of implementation     comprise in particular the LacZΔCpG gene (positions 3 to 3056 of SEQ     ID No 9; cf. example 5).

Preferably, said choice of codon will also be made so as to avoid structures which are unfavorable for the messenger RNA, such as the presence of splice sequences, of direct or inverted repeat sequences, of stem-loop structures or of polyadenylation signals. The number and the size variety of the genes synthesized by this method are illustrated in the examples below, which show that it is thus possible to envision the synthesis of genes completely devoid of CpG which nevertheless remain functional in E. coli. As a reference protein, any protein which can be expressed by E. coli can be chosen, for example a protein encoded by a gene for resistance to an antibiotic, such as the genes for resistance to zeocin® (phleomycin), to hygromycin, to blasticidin or to puromycin, or a protein encoded by reporter genes such as lacZ.

The present application is also directed toward such a method for obtaining genes devoid of CpG, which can be expressed in E. coli, and also any gene of at least 250 bp which can be obtained using this method. More particularly, the present application is directed toward:

-   -   any gene the sequence of which comprises the sequence SEQ ID No.         1 from position 3 to position 374 (FIG. 1), the sequence SEQ ID         No. 3 from position 3 to position 1025 (FIG. 3), the sequence         SEQ ID No. 5 from position 3 to position 422 (FIG. 5), the         sequence SEQ ID No. 7 from position 3 to position 599 (FIG. 7),         and also any use of these genes as selection markers, and     -   any gene the sequence of which comprises the sequence SEQ ID No.         9 from position 3 to position 3056 (cf. FIG. 9), and any gene         the sequence of which comprises the sequence SEQ ID No. 316         (positions 3 to 797 of the DNA sequence presented in FIG. 18         (encoding SEQ ID No. 317)), and also any use of such a gene as a         reporter gene.

The expression of a plasmid gene also requires having promoters suitable for the cell harboring the plasmid. The fact that the E. coli genome has been entirely known for a few years has facilitated the study of various noncoding elements exhibiting specific functions. Results from studies relating to the nature of the E. coli promoters are continually being updated and made public on the PromEC site accessible via the Internet (http://bioinfo.md.huji.ac.il/marg/promec). An analysis of the 471 well-defined promoters from base −75 to +25 relative to the transcription initiation point +1 reveals that only 6 of them do not possess CpG. The addition of each one of the 6 promoters synthesized chemically and placed upstream of the lacZ gene encoding E. coli β-galactosidase has proved to be negative for detection of the activity of this reporter gene. The lack of strong homology with the consensus sequences of the −10 and −35 canonic boxes suggests that these promoters are low strength promoters and, alternatively, that said promoters might be regulated by induction conditions yet to be defined for each one of them. An analysis of the well-characterized promoters of E. coli has revealed that only about 10 do not contain any CpG in the specific boxes for recognition by RNA polymerase. A bibliographical search has also revealed that these promoters are all inducible by various stimuli, a situation which is sometimes desired but most often relinquished for expression which is constitutive in nature. As regards the inventors, they have succeeded, by random PCR assembly of fragments exhibiting short consensus sequences devoid of CpG, drawn from several strong promoters, in developing novel promoters which are suitable for the expression of genes without CpG in E. coli, and which have the particular advantage of being very strong constitutive promoters and of being completely devoid of CpG. Example 6 below illustrates the construction of the novel promoter EM2K using this technology. The present application is more particularly directed toward any promoter, the sequence of which comprises the sequence SEQ ID No. 11 (cf. FIG. 11).

The characterized transcription terminators in E. coli are made of short sequences, several of which do not possess any CpG, and the inventors have been able to verify that such terminators effectively perform their function when they are associated, in E. coli, with a CpG-free promoter and gene according to the invention.

The present application is thus directed toward any transcriptional unit which comprises at least one CpG-free gene according to the invention, and at least one CpG-free promoter according to the invention. Such a transcriptional unit may also comprise at least one CpG-free terminator. The invention thus provides, for the first time, a nucleotide group which is completely devoid of CpG, and which can nevertheless be expressed in E. coli, performing its normal functions therein.

The present application is thus directed toward any plasmid which comprises an origin of replication according to the invention. Such plasmids may also comprise a CpG-free gene according to the invention and/or a CpG-free promoter according to the invention and/or a CpG-free transcription terminator, or a transcriptional unit according to the invention. The plasmids according to the invention therefore have the advantage of being able to exhibit no CpG in their structure, while at the same time still being capable of performing expression vector functions. Examples of such plasmids are given in the examples below.

The present application is more particularly directed toward any plasmid of SEQ ID No. 14 (FIG. 15).

Any cell transformed with at least one element selected from the group consisting of the CpG-free genes according to the invention, the CpG-free promoters according to the invention, the CpG-free origins of replication according to the invention and the CpG-free plasmids according to the invention also falls within the field of the present application. Such a cell according to the invention may also comprise a gene encoding a pi protein, such as wild-type pir or mutated pir, pir 116. Advantageously, a transformed cell according to the invention is an E. coli cell.

To replicate a sufficient number of functional copies of a plasmid according to the invention, those skilled in the art have at their disposal many bacteria, such as, for example, E. coli K12 bacteria which are conventionally used for the purposes of plasmid replication. The original K12 strain of E. coli has a DNA methylase which introduces a methyl group onto all the cytosines placed in the context CC(A/T)GG of the genomic and plasmid DNAs of the bacterium. All of the various strains of the K12 line have this activity due to a methylase encoded by the dcm gene (Palmer B. R. and Marinus M. G. (1994) Gene 143, 1-12). Since methylation of the plasmid DNAs prepared from dcm⁺ strains of E. coli leads to a modification of the DNA molecule which is undesirable for the transfer of genes into eukaryote cells, the inventors have developed a strain which makes it possible both for plasmids with the CpG-free R6K gamma origin in accordance with the invention to function, and plasmid DNA devoid of methylation on the dcm sites to be obtained. For this, a new gene was constructed by the inventors, by deleting the dcm gene from position +3 after the ATG to position −14 before the TGA in the dcm gene of a pir 116 strain (cf. Example 10 below).

The dcm gene is located in a chromosomal region the sequence of which can be obtained via GenBank with the accession number D90835 (cloneKohara #344: 43.5-43.9 min).

A deletion in the gene (−) introduces the additional advantage of avoiding any reversion of the gene to the wild-type form. dcm⁻ mutant strains have thus been produced by the inventors; they exhibit no negative phenotype which might impair bacterial growth or modify the quality and quantity of the plasmid DNA. More particularly, an optimized strain of E. coli has been constructed by targeted inactivation of the dcm gene using a parent strain expressing a mutated pi protein at a site leading to an increase in the number of copies of CpG-free plasmids. This optimized strain allows quality and abundant production of the plasmid DNAs which are the subjects of this invention, and which are devoid of CpG and free of methylation on the cytosines of the dcm sites. The present application is therefore directed toward any cell comprising a gene encoding the pi protein, which is transformed with the deleted dcm gene according to the invention, and any method of replicating plasmids, which comprises transforming such a cell with a plasmid, and culturing the transformed cell under conditions suitable for replication of this plasmid.

The present application is thus directed toward a method for producing a plasmid completely devoid of CpG and free of methylation on cytosine in the nucleic acid context CC(A/T)GG, characterized in that a plasmid according to the invention is produced by replication in an Escherichia coli strain expressing the pi protein, which is deficient for the dcm methylation system.

Any kit for producing plasmids, which comprises at least one cell according to the invention, also falls within the field of the present application. These kits are particularly suitable for the replication of plasmids according to the invention, in order to avoid these plasmids, the structure of which is devoid of CpG, being, moreover, methylated on CC(A/T)GG during their replication.

The invention thus provides a complete set of transformation means devoid of CpG; CpG-free genes, CpG-free promoters, CpG-free transcriptional units, CpG-free origins of replication for plasmid, CpG-free plasmids, cells specially suited to replication of plasmids without methylation of cytosines. These novel means find direct applications for genetic transformation of cells for biotechnological or medical purposes. Such products are in fact exceptionally well suited to the production of DNA vaccine compositions intended for humans or animals.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is illustrated by the following examples, in which reference is made to the figures:

FIG. 1: Sequence of the Sh ble ΔCpG gene (CpG-free),

FIG. 2: List of the oligonucleotides used for assembling the Sh ble ΔCpG gene,

FIG. 3: Sequence of the Hph ΔCpG gene,

FIG. 4: List of the oligonucleotides used to assemble the Hph ΔCpG gene,

FIG. 5: Sequence of the Bsr ΔCpG gene,

FIG. 6: List of the oligonucleotides used to assemble the Bsr ΔCpG gene,

FIG. 7: Sequence of the Pac ΔCpG gene,

FIGS. 8 a , 8 b and 8 c: List of the oligonucleotides used to assemble the Pac ΔCpG gene,

FIGS. 9 a and 9 b: Sequence of the LacZ ΔCpG gene,

FIG. 10A: List of the oligonucleotides used to assemble the first third of the LacZ ΔCpG gene,

FIGS. 10 b-1 and 10 b-2: List of the oligonucleotides used to assemble the second third of the LacZ ΔCpG gene,

FIG. 10C: List of the oligonucleotides used to assemble the third third of the LacZ ΔCpG gene,

FIG. 11: Sequence of the EM7 promoter (1−), and of the degenerative oligonucleotides (2−) used to construct the EM2K promoter devoid of CpG (3−),

FIG. 12: Sequence of the R6K gamma M2A origin of replication,

FIG. 13: List of the oligonucleotides used to assemble the R6K gamma M2A origin of replication,

FIGS. 14 a, 14 b and 14 c: Sequence of the R6K gamma origin of the plasmid pGTR6Kneoc9 delimited by the PacI sites,

FIG. 15: Sequence of the plasmid pSh-LacZΔCpG gene,

FIG. 16: Map of the plasmid pShΔCpG,

FIG. 17: Map of the plasmid pSh-LacZΔCpG,

FIG. 18: Sequence of the NeoΔCpG gene (CpG-free) [position 3 to 797 of DNA sequence ═SEQ ID No. 316; protein sequence ═SEQ ID No. 317),

FIG. 19: Sequence of the oligonucleotides SEQ ID No. 318 to SEQ ID No. X used to assemble the NeoΔCpG gene.

EXAMPLE 1 Construction of the Sh ble Gene for Resistance to Zeocin Devoid of CpG

The Sh ble ΔCpG gene, the sequence of which is given in FIG. 1 (positions 3 to 377 of SEQ ID No. 1) was synthesized from an assembly of overlapping oligonucleotides (20-40 pb in size), the sequences of which are given in FIG. 2. The assembly method is carried out in three steps; the first step consists of phosphorylation of the oligonucleotides of the coding strand, in a second step, all the oligonucleotides of both strands are combined by hybridization and ligation and, in the final step, the gene is amplified by PCR. This method was successfully used to synthesize all the synthetic genes mentioned in Examples 1, 2, 3, 4 and 5. Details of the method are given for the Sh ble ΔCpG gene:

The 10 oligonucleotides, from OL26199 to OL27099 (FIG. 2), corresponding to the coding strand, are phosphorylated according to the following procedure: 1 μl of each one of the oligonucleotides taken up in water at 250 μM are mixed in a microtube containing 15 μl of water so as to bring the final solution to a concentration of 100 picomol per microliter. 5 μl of this solution are then mixed with 10 μl of 0.10-times concentrated polynucleotide kinase buffer, 0.4 μl of a 50 mM ATP solution, 85 μl of water and 1 μl of the enzyme (at 10 μ/μl), and the entire mixture is incubated for 4 hours at 37° C. (solution A).

A solution of the oligonucleotides of the noncoding strand is made up by mixing 1 μl of each oligonucleotide (OL27199 to OL28199; cf. FIG. 2) and 1 μl of the oligonucleotide OL26099 (FIG. 2), to which solution 43 μl of water are added in order to obtain a final solution at 54 picomol per μl (solution B).

Assembly of the gene is carried out first by mixing 10 μl of solution A, 1 μl of solution B, 6 μl of a 100 mM KCl solution, 3 μl of a 0.5% NP-40 solution, 4 μl of a 50 mM MgCl₂ solution, 3 μl of a 10 mM ATP solution and 7.5 μl of Pfu ligase (30 units), and the mixture is then heated in a programmable thermocycler for 3 minutes at 95° C. and then 3 minutes at 80° C., before undergoing 3 cycles of one minute at 95° C., followed by a change from 95° C. to 70° C. in 1 minute, and then a change from 70° C. to 55° C. in 1 hour and, finally, 2 hours at 55° C. The mixture of the assembled oligonucleotides is then amplified with the primers OL26099 and OL27199. The amplification product is purified on a promega column, digested with the NcoI and NheI restriction enzymes, and cloned into the plasmid pMOD1LacZ(wt) linearized with NcoI and NheI. The sequences of the plasmid DNA of 2 zeocin-resistant clones which appeared after transformation of the E. coli strain GT100 (available from Invivogen) with the mixture of the ligation between the vector fragment and the PCR fragment, were found to be in accordance with the desired sequence given in FIG. 1. This synthetic gene, placed under the control of the bacterial EM7 promoter (vector pMOD1Sh ΔCpG) confers a zeocin resistance identical to that provided by the same vector containing the native Sh ble gene with the recipient E. coli strain GT100.

EXAMPLE 2 Construction of the Hph Gene for Resistance to Hygromycin Devoid of CpG

The synthetic Hph ΔCpG (sequence SEQ ID No. 3 given in FIG. 3) was constructed according to the method described in Example 1. The two strands were synthesized using oligonucleotides of 60 bases plus two oligonucleotides of 30 bases with an overlapping region of 30 bases. The assembly of the various oligonucleotides given in FIG. 4 was carried out through a final PCR with the sense oligonucleotide TTCAGCTGAGGAGGCACATC (SEQ ID No. 299) and the reverse oligonucleotide CTCAGGATCCGCTAGCTAAT (SEQ ID No. 300) according to the experimental conditions mentioned in the preceding example.

The amplified and purified fragment (1068 pb) was then digested with the BspHI and NheI restriction enzymes and cloned into the vector pMOD2 LacZ(wt), in which the NcoI site of pMOD1 is replaced with the BspHI site. The E. coli clones containing this recombinant vector were selected on FastMedia™ Hydro Agar medium (Cayla). The sequence SEQ ID No. 3 in FIG. 3 was confirmed by sequencing on both strands of the plasmid DNA of two hygromycin-resistant clones. This synthetic gene, placed under the control of bacterial EM7 promoter (vector pMOD2 Hph ΔCpG), confers a hygromycin B resistance which is at least equal to that provided by the same vector containing the native Hph gene with the recipient E. coli strain GT100.

EXAMPLE 3 Construction of the Bsr Gene for Resistance to Blasticidin Devoid of CpG

The Bsr ΔCpG gene, the sequence of which is given in FIG. 5 (SEQ ID No. 5), was synthesized using the oligonucleotides indicated in 6, by following the method described in Example 1. The mixture of the assembled oligonucleotides was amplified with the primers OL64 and OL76 (cf. FIG. 6). The amplified and purified fragment was then digested with the BspHI and NheI restriction enzymes and cloned into the vector pMOD2 LacZ(wt). The E. coli clones containing this recombinant vector were selected on FastMedia™ Blasti Agar medium (Cayla). The sequence SEQ ID No. 5 in FIG. 5 was confirmed by sequencing on both strands of the plasmid DNA of two blasticidin-resistant clones.

This synthetic gene, placed under the control of the bacterial EM7 promoter (vector pMOD2 Bsr ΔCpG), confers resistance to blasticidin which is identical to that provided by the same vector containing the native Bsr gene with the recipient E. coli strain GT100.

EXAMPLE 4 Construction of the Pac Gene for Resistance to Puromycin Devoid of CpG

The BspHI-NheI fragment (Pac ΔCpG gene; SEQ ID No. 7), the sequence of which is given in FIG. 7, was synthesized by assembling the oligonucleotides indicated in FIG. 8.

The mixture of the assembled oligonucleotides was amplified with the sense primer pur24 (AGGACCATCATGACTGAG; SEQ ID No. 301) and the reverse primer pur25 (ATCATGTCGAGCTAGCTC; SEQ ID No. 302). The purified BspHI-NheI fragment was cloned into the plasmid pMOD2LacZ (wt) between the BspHI and NheI sites. The sequences of the plasmid DNA of 2 puromycin-resistant clones of the GT100 strain, which appeared on the FastMedia™ puro Agar medium (Cayla) after transformation by the product of the ligation between the vector fragment and the PCR fragment, were found to be in accordance with the desired sequence given in FIG. 7. The synthetic Pac ΔCpG gene, placed under the control of the bacterial EM7 promoter (vector pMOD2 Pac ΔCpG), confers a puromycin resistance which is slightly greater than that provided by the same vector containing the native pac gene with the recipient E. coli strain GT100.

EXAMPLE 5 Construction of the LacZ Gene Devoid of CpG Encoding β-Galactosidase of E. coli

The synthetic LacZ ΔCpG gene (SEQ ID No. 9 given in FIG. 9) was constructed according to the method described in the preceding examples. Given the size of the gene to be produced (more than 3000 pb), the construction was carried out in 3 distinct parts, conserving the EcoRV and SacI restriction sites at the same sites as on the native sequence of the lacZ gene. For each part, the two strands were synthesized using oligonucleotides of 40 bases plus two oligonucleotides of 20 bases with an overlapping region of 20 bases.

The first region corresponds to the NcoI-EcoRV fragment (Part I), the second region corresponds to the EcoRV-SacI fragment (Part II) and the third region corresponds to the SacI-NheI fragment. The assembly of the various oligonucleotides given in FIGS. 10A (oligonucleotides used to assemble part I), 10B (oligonucleotides used to assemble part II), and 10C (oligonucleotides used to assemble part III) was carried out by PCR according to the same experimental conditions stated in the preceding examples. The gradual cloning of the three parts of the synthetic gene was carried out in the vector pMOD1 LacZ (wt). The functionality of each cloned part and also that of the complete synthetic gene present on the vector pMOD1 LacZ was demonstrated by revealing the β-galactosidase activity on FastMedia™ Amp Xgal Agar medium (Cayla), of the recombinant clones obtained in the MC1061ΔLac strain. The complete synthetic LacZ ΔCpG gene, placed under the control of the EM7 promoter, gives 30% less β-galactosidase activity (luminometric assay of protein extracts from culture) compared to the expression of the native LacZ gene in the same plasmid environment.

EXAMPLE 6 Construction of a Strong Constitutive Promoter for E. coli, Devoid of CpG

The bacterial EM7 promoter present on vectors of the pMOD1 type is a synthetic promoter which is constitutive and strong in E. coli. Its sequence, which contains 3 CpG (SEQ ID No. 297 in FIG. 11), was used as a reference to produce a bacterial promoter devoid of CpG. We produced “linker” oligonucleotides which were degenerative at 4 places (indicated W, D, W and H on the sequence SEQ ID No. 298 in FIG. 11) and compatible with the AseI and NcoI restriction sites. These various oligos were hybridized and cloned into pMOD1 ShΔCpG between the AseI and NcoI restriction sites of the EM7 promoter. After selection of the recombinant clones on FastMedia™ Zeo Agar medium and determination of the promoter sequence of the most zeocin-resistant clone, we selected the EM2K promoter (sequence SEQ ID No. 11 in FIG. 11) as the bacterial promoter devoid of CpG.

EXAMPLE 7 Synthesis of the R6K Gamma Origins Devoid of CpG

The PacI DNA fragment containing the R6K gamma M2A origin (SED ID No. 12 in FIG. 12) was synthesized by PCR from the assembly of the oligonucleotides indicated in FIG. 13. The R6K gamma M2A fragment assembly was amplified with the primers RK15 (GCAGGACTGAGGCTTAATTAAACCTTAAAAC; SEQ ID No. 303) and RK16 (AAGTCTCCAGGTTAATTAAGATCAGCAGTTC: SEQ ID No. 304), and the fragments, after digestion with the PacI enzyme, were cloned into a plasmid (pGTCMVneo) containing the kanamycin resistance gene and the pUC origin of replication bordered by 2 PacI sites. Many transformants of the GT97 strain (which expresses the pi protein) were analyzed and only clones containing a high-copy plasmid conserved after several rounds of subculturing in the absence of kanamycin were selected. After sequencing, it was found that the ori fragment of most of these plasmids could have a lower number (5-6) of repeat sequences, instead of the 7, of the natural origin of the R6K plasmid. One of these novel sequences of the synthetic R6K gamma origin devoid of CpG is given in SEQ ID No. 13 in FIG. 14).

Two other versions of the R6K gamma origin, in which the G of each CpG present in the repeat sequences (22 bp element repeated several times in the pi protein-binding region) has been replaced with a C, to give the origin (R6K gamma M2C), or a T, to give the origin (R6K gamma M2T), were synthesized in a similar manner. The functionality of these novel R6K gamma origins in which the G of the CpGs of the repeat sequences is replaced with a C or with a T, added to the example of the origin of FIG. 13, in which the G is replaced with an A, demonstrates that the CpGs of these repeat sequences do not play a role in the functionality of the origin.

EXAMPLE 8 Assembly of Plasmid Vectors Completely Devoid of CpG, Expressing a Gene for Resistance in E. coli

Firstly, a PacI-PacI cassette containing the bacterial EM2K promoter and the Sh ΔCpG zeocin resistance gene followed by a CpG-free bacterial terminator was prepared. For this, “linker” oligonucleotides containing the sequence of the t1 terminator of the intergenic region rpsO-pnp of E. coli were hybridized and cloned between the NheI and PacI sites of the vector pMOD1 EM2K Sh ΔCpG: “linker” oligonucleotides:

rpsO-1 (5′->3′): CTAGCTGAGTTTCAGAAAAGGGGGCCTGAGTGGCCCCTTTTTTCAACTTAAT SEQ ID No. 305 rpsO-2 (5′->3′): TAAGTTGAAAAAAGGGGCCACTCAGGCCCCCTTTTCTGAAACTCAG. SEQ ID No. 306

The recombinant vector obtained (pMOD1 EM2K sH ΔCpG Term) was verified by sequencing in the region of the terminator sequence which does not naturally contain any CpG. The EM2K-Sh ΔCpG-Term cassette contained in this vector was then amplified by PCR so as to flank the two sides with PacI sites using the following primers:

PACI-UP (5′->3′): ATCGTTAATTAAAACAGTAGTTGACAATTAAACATTGGC SEQ ID No. 307 PACI-DOWN (5′->3′): ATCGTTAATTAAGTTGAAAAAAGGGGCC. SEQ ID No. 308

This amplified fragment was then purified and cleaved with PacI, and then assembled with PacI fragment containing the R6K gamma ΔCpG origin described in Example 7. After transformation of this ligation mixture into the GT97 strain (which expresses the pi protein) and selection on FastMedia™ Zeo medium, analysis of the recombinant clones obtained revealed two possible orientations of the PacI-PacI fragment containing the R6K gamma ΔCpG origin. The orientation selected in pSh ΔCpG is represented in FIG. 16.

EXAMPLE 9 Assembly of a Plasmid Vector Completely Devoid of CpG, Expressing the Zeocin Resistance Gene and the β-Galactosidase Gene in E. coli.

The vector pSh ΔCpG described in Example 8 (FIG. 16) was used to insert the synthetic LacZ gene devoid of CpG between the EcoRI and NheI sites. For this, EcoRI- and NcoI-compatible “linker” oligonucleotides containing a ribosome-binding site consensus sequence from E. coli were hybridized and cloned with the NcoI-NheI LacZΔCpG fragment of pMOD1 LacZΔCpG, between the EcoRI and NheI sites of the vector pMOD1 EM2K ShΔCpG. “linker” oligonucleotides used:

rbs-1 (5′->3′): AATTCTGAGGAGAAGCT SEQ ID No. 309 rbs-2 (5′->3′): CATGAGCTTCTCCTCAG SEQ ID No. 310

Transformation of this ligation mixture into the GT97 strain (which expresses the pi protein) and selection on FastMedia™ Zeo Xgal medium made it possible to obtain the recombinant clones containing the vector pSh-LacZΔCpG (FIGS. 15 and 17). This vector co-expresses, under the control of the bacterial EM2K promoter, in an artificial operon system, the ShΔCpG and LacZΔCpG genes.

EXAMPLE 10 Production of an E. coli Strain Expressing the Mutant Protein pi116 and Carrying a Deletion in the dcm Gene

The pir gene encoding the pi protein which is essential for initiating replication of the R6K gamma origin, and also the mutated gene pir116 which leads to an increase in the number of copies of R6K gamma plasmids, have been introduced, in a functional form, into various E. coli K12 strains by various groups. Strains of this type can be obtained from the E. coli Genetic Stock Center (http://cgsc.biology.yale.edu), and are also commercially available from companies specializing in supplying biological material for research. This is the case, for example, of the pir1 (pir116) and pir2 (wild-type pir) strains provided by the company Invitrogen, whose products can be purchased in all European countries. The GT97 strain of the K12 line, which has the genotype Δlac169 hsdR514 endAl recAl codBa uidA (ΔMlul)::pir 116 (available from InvivoGen), was chosen for its simplicity, the consistency of the R6K gamma plasmid DNA preparations and its high levels of competence, from several K12 pir strains of distinctive genotype for some genes. The introduction of a deletion into the dcm gene of the GT97 strain was carried out in the following way:

Two DNA regions, of 1.8 kb and 1.5 kb, flanking respectively the ATG initiation codon (fragment A) and the TGA stop codon (fragment B) of the dcm gene were amplified by PCR. The fragment A was then amplified with the pair of primers OLdcmAF (TTTTGCGGCCGCTTGCTGCGCCAGCAACTAATAACG; SEQ ID No. 311) and OLdcmAR (CCTTGGATCCTGGTAAACACGCACTGTCCGCCAATCGATTC; SEQ ID No. 312) and fragment B was amplified with the pair of primers OLdcmBF (TTTTGGATCCTCAGCAAGAGGCACAACATG; SEQ ID No. 313) and OLdcmBR (TTTTCTCGAGAAACGGCAGCTCTGATACTTGCTTC; SEQ ID No. 314). The restriction sites for the NotI (GCGGCCGC), BamHI (GGATCC) and XhoI (CTCGAG) enzymes were introduced into the primers in order to combine fragment A and fragment B with one another, forming a genetic element flanked by the NotI and XhoI sites. The region of the dcm gene is thus reconstituted, creating a deletion which stems from position +3 after the ATG to position −14 before the TGA. This genetic element was cloned into pKO3 (Link A. J., Phillips D. and Church G. M. (1997) J Bacteriol 179, 6228-37), a vector developed for allele replacement in Escherichia coli, with thermosensitive replication, between the NotI and SalI sites, to give the plasmid named pKO3Δdcm. The GT97 strain was co-transformed with this plasmid and with a plasmid which expresses the RecA protein (pFL352). A transformant containing the two plasmids was cultured at a nonpermissible temperature (42° C.) in the presence of chloramphenicol in order to select clones which have integrated pKO3Δdcm into the bacterial chromosome by homologous recombination. A subclone resistant to chloramphenicol at 42° C. was then cultured at 30° C. on a medium containing a high concentration of sucrose (5%) in order to counter-select the strains which, after a second homologous recombination event, have exchanged the chromosomal region of the dcm gene with the homologous fragment cloned into the plasmid. The deletion introduced into the selected clone (GT106) was verified by PCR with the pair of primers OldcmAF and OldcmBR, generating a fragment smaller in size than that obtained with the parental strain, and by a PCR with the primer OldcmBR and a primer positioned outside the exchanged region (OldcmCF TTTTGCGGCCGCGTTGCGGTATTACCCTTGTC; SEQ ID No. 315).

The dcm⁻ genotype of the GT106 strain was confirmed by introducing into said strain and into GT106, a plasmid containing a restriction site for the SexAI enzyme, which is subject to dcm methylation. The plasmid purified from GT106 is cleaved by SexAI, whereas it is resistant to the enzyme when it is purified from GT97.

The latter strain named GT106 exhibits the same growth characteristics as the parental strain GT97 and, as expected, no negative modification of the amount of R6K gamma plasmid DNAs was observed, only the quality of the DNAs, assessed by the absence of methylation of the cytosines of the dcm sites, was improved. The GT106 strain will be available from the company Invivogen from the day on which this patent application is filed.

EXAMPLE 11 Production of the Neo Gene for Resistance to Neomycin, Devoid of CpG

The Neo ΔCpG gene, the sequence of which is given in FIG. 18 (position 3 to 797 of the DNA sequence given in FIG. 18=SEQ ID No. 316; protein sequence ═SEQ ID No. 317), was synthesized from an assembly of overlapping oligonucleotides (20-40 pb in size), the sequences of which are given in FIG. 19. The assembly method is carried out in three steps; the first step consists of phosphorylation of the oligonucleotides of the coding strand, in a second step, all the oligonucleotides of both strands are combined by hybridization and ligation and, in the final step, the gene is amplified by PCR.

The 20 oligonucleotides of SEQ ID No. 319 to SEQ ID No. 338 (FIG. 19) corresponding to the coding strand are phosphorylated according to the following procedure: 1 μl of each one of the oligonucleotides, taken up in water at 250 μM, are mixed in a microtube containing 50 μl of water, so as to bring the final solution to a concentration of 100 picomol per microliter. 5 μl of this solution are then mixed with 10 μl of 10-times concentrated polynucleotide kinase buffer, 0.4 μl of a 50 mM ATP solution, 85 μl of water and 1 μl of the enzyme (at 10 μ/μl), and the entire mixture is incubated for 4 hours at 37° C. and then 5 minutes at 95° C. (solution A).

A solution of the oligonucleotides of the noncoding strand is made up by mixing 1 μl of each oligonucleotide (SEQ ID No. 339 to SEQ ID No. 360; FIG. 19) and 1 μl of the oligonucleotide SEQ ID No. 318 (FIG. 19), to which solution 160 μl of water are added in order to obtain a final solution at 54 picomol per μl (solution B).

The assembly of the gene is carried out first by mixing 10 μl of solution A, 1 μl of solution B, 6 μl of a 100 mM KCl solution, 3 μl of a 0.5% solution of the surfactant NP-40, 4 μl of a 50 mM MgCl₂ solution, 3 μl of a 10 mM ATP solution and 7.5 μl of Pfu ligase (30 units), and the mixture is then heated in a programmable thermocycler for 3 minutes at 95° C. and then 3 minutes at 80° C., before undergoing 3 cycles of one minute at 95° C., followed by a change from 95° C. to 70° C. in 1 minute, then a change from 70° C. to 55° C. in 1 hour and, finally, 2 hours at 55° C. The mixture of the assembled oligonucleotides is then amplified with the primers NO1 and NO₂₂. The amplification product is purified on a Promega column, digested with the BspHI and NheI restriction enzymes and cloned into the plasmid pMOD2LacZ(wt) linearized with BspHI and NheI. The sequences of the plasmid DNA of 2 kanamycin-resistant clones, which appeared after transformation of the E. coli strain GT100 (available from Invivogen) with the mixture of the ligation between the vector fragment and the PCR fragment, were found to be in accordance with the sequence given in FIG. 18. This synthetic gene, placed under the control of the bacterial EM7 promoter (vector pMOD2Neo ΔCpG), confers a resistance to kanamycin identical to that provided by the same vector containing the native neo gene with the recipient E. coli strain GT100. The BspHI-NheI neo fragment of the plasmid pMOD2Neo ΔCpG was then introduced into the plasmid pSh ΔCpG of FIG. 16, linearized with NcoI-NheI, to give, after ligation and transformation in E. coli, the plasmid pNeoΔCpG. 

1. An origin of replication for a plasmid, wherein its sequence corresponds to that of the R6K gamma origin of replication in which each G of the CpGs of the repeat region of the core has been replaced with an A, a C or a T, or each C of the CpGs has been replaced with a G, an A or a T.
 2. The origin of replication as claimed in claim 1, wherein its sequence comprises the sequence SEQ ID NO: 12 or the sequence SEQ ID NO:
 13. 3. The origin of replication as claimed in claim 1, wherein the pi protein-binding sequence is repeated 5 or 6 times.
 4. A promoter, whose sequence comprises the sequence SEQ ID NO:
 11. 5. A plasmid comprising an origin of replication as claimed in claim
 1. 6. The plasmid as claimed in claim 5, being completely devoid of CpG.
 7. A plasmid of SEQ ID NO:
 14. 8. A method for producing a plasmid completely devoid of CpG and free of methylation on cytosine in the nucleic acid context CC(A/T)GG, wherein a plasmid as claimed in claim 5 is produced by replication in an Escherichia coil strain expressing the pi protein, which is deficient for the dcm methylation system.
 9. An Escherichia coil cell transformed with the plasmids as claimed in claim
 5. 10. The transformed Escherichia coil cell as claimed in claim 9, wherein it expresses a gene encoding a pi protein.
 11. The transformed Escherichia coli cell as claimed in claim 10, which further comprises an inactivated dcm gene.
 12. A kit for producing plasmids, comprising at least one cell as claimed in claim
 9. 